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AUDIO TESTING IN A PACKET SWITCHED NETWORK 



BACKGROUND 

This invention relates to audio testing in a packet 
switched network. 
5 Audio testing is useful, for example, in Internet 

telephony, in which telephone-like voice conversations are 
digitized by personal computers for transmission over the 
Internet either to other personal computers, where they are 
reconverted to analog audio, or through an Internet Telephony 
10 Service Provider (ITSP) and the public switched telephone 
network (PSTN) to conventional telephony equipment. Audio 
quality of Internet telephony is affected by time delays 
caused during transmission over the Internet, packet loss, 
data retransmissions, and network jitter. 

15 DESCRIPTION OF DRAWINGS 

Figure 1 is a block diagram of a test system. 

Figure 2 is a block diagram of an audio analyzer. 

Figure 3 is a flow chart illustrating one embodiment of a 

process in which the audio analyzer quantifies the audio 
20 transmission qualities of network-based telephony 
applications . 
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Figure 4 is a flow chart illustrating one embodiment of a 
process in which the audio analyzer generates peak-based 
envelope waveforms . 

Figures 5 and 6 are plots of an input audio signal and an 
output audio signal, respectively. 

Figure 7 is a plot illustrating envelope waveforms 
generated from the input audio signal and the output audio 
signal . 

Figure 8 is a plot illustrating a summary data loss 
waveform. 

DESCRIPTION 

Conventional communication protocols, such as the H.323 
protocol, provide standards for audio, video, and data 
communications across packet-based networks, including the 
Internet. H.323: The Multimedia Communications Standard for 
Local Area Networks, IEEE Communications Magazine, Vol. 34, 
No. 12, 1996, pp. 52-56. By complying with these standards, 
multimedia products and applications can interoperate, 
allowing users to communicate without concern for 
compatibility. These communication protocols also establish a 
variety of standards for digitizing and compressing speech, 
which reflect tradeoffs between speech quality, bit rate, 
computer power, and signal delay. For example, H.323 
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compliant devices uses the support of a G.723 coder /decoder 
(codec) for speech compression that is designed to transmit 
audio data at low speeds such as 56 kbps . 

Figure 1 is a block diagram illustrating an audio test 

5 environment 100 for analyzing and quantifying audio data 
losses that occur during telephony calls over a packet- 
switched network 108 such as the Internet. Test environment 
100 can be used to analyze a variety of audio transmission 
qualities, such as latency, audio data loss and frequency 

10 response, that are experienced during telephony calls between 
packet-based telephony equipment such as computers, Internet 
telephones, and even radio-frequency (RF) communication 
devices . 

Transmit device 104 and receive device 110 are telephony- 
15 enabled devices, such as computers, hand-held personal digital 
assistants (PDA's) and Internet telephones, that are capable 
of supporting a telephony communication session over network 
108. In one implementation, transmit devices 104 and receive 
devices 110 are general-purpose computers acting as hosts for 
20 telephony software and hardware to be tested. Network 108 

represents any packet-switched network, such as the Internet, 
and communicatively couples transmit device 104 and receive 
device 110. Transmit device 104 and receive device 110 may be 
connected to network 108 by a variety of means such as network 
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cards, digital subscriber line (DSL) modems, cable modems, and 
conventional modems accessing network 108 via Internet 
Services Providers . 

Transmit device 104 communicates audio data packets 120 

5 to receive device 110 via network 108. Transmit device 104 
includes a codec (not shown) for compressing the digitized 
input audio signal 106 for data packet transmission data. 
Receive device 110 receives audio data packets 120 from 
network 108, decompresses the compressed audio data using an 

10 internal codec (not shown) and converts the audio data packets 
120 into output audio signal 112, which can be used to drive a 
handset or a speaker (not shown) . 

The codecs of transmit device 104 and receive device 110 
can be implemented in software, hardware, or a combination 

15 thereof, and buffer audio data packets 120 for a fixed time 

duration, referred to herein as the buffer length. The buffer 
length is a function of the type of codec. For example, the 
G.723 codec defines an audio buffer length of 30ms. Other 
conventional codecs include the G.711 and G.729, which have 

20 buffer durations of 120msec and 10msec, respectively. 

Audio generator 102 represents any device suitable for 
producing input audio signal 106. For example, in one 
configuration audio generator 102 is a computer system having 
a high-quality audio card and an audio editor software 
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program, such as the Cool Edit™ from Syntrillium Software™, 
suitable for modifying current audio files and creating custom 
audio files that are digital representations of analog audio 
signals having specific characteristics. Using the audio 

5 editor, a user can generate and modify audio files, such as 
adding a trigger signal. 

In one implementation, three different types of wave 
files are used for testing transmit device 104 and receive 
device 110: continuous, alternating, and "pink noise." A 

10 continuous wave file uses a single audio channel to stream 

audio to transmit device 104. An alternating wave file uses a 
dual audio channel to stream audio signals and is useful to 
test whether receive device 110 correctly detects silence on a 
given channel. Pink noise wave files are used to measure the 

15 frequency response of the codecs within transmit device 104 
and receive device 110 and consist of white noise that has 
been modified with a pinking filter. The pinking filter is 
used to create noise that has equal energy per octave. 

Audio generator 102 adds a triggering signal to the wave 

20 files such that an audio analyzer 114 can synchronize input 

audio signal 106 with output audio signal 112. For example, a 
0 dB amplitude, 10 cycle 220 Hz sine wave signal is used for 
triggering audio analyzer 114 during latency and data loss 
measurements . 
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In one implementation, audio generator 102 includes a 
speaker (not shown) to produce audible sound as a function of 
the generated audio signal. A microphone proximate to the 
speaker converts the sound generated by the speaker into input 

5 audio signal 106. In another implementation, audio generator 
provides the generated audio signal directly to an input jack 
of a sound card within transmit device 104. 

Audio analyzer 114 captures input audio signal 106 and 
output audio signal 112 and helps users objectively determine 

10 whether audio data packets 120 experienced any data loss. 

More specifically, audio analyzer 114 compares the captured 
audio data signals 106 and 112 in order to quantify the 
received audio performance quality between two telephony end 
points, i.e., transmit device 104 and receive device 110. 

15 Figure 2 is a block diagram illustrating one embodiment 

of audio analyzer 114 . Audio analyzer 114 is a computer 
system having a multi-channel dynamic signal analyzer (DSA) 
202 and audio test software including configuration module 
204, user interface 206, analysis module 208, acquisition 

20 module 210 and file management module 212. 

DSA 202 is a computer-based Fast Fourier Transform (FFT) 
dynamic signal analyzer, such as the NI 4551 PCI Dynamic 
Signal Analyzer from National Instruments™, that delivers fast 
spectrum analysis, network analysis, and transient event 
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analysis of sampled time -domain waveforms. DSA 202 can 
acquire time-varying signals and compute the frequency 
spectrum of the signals using Fourier analysis. DSA 202 has 
two inputs, channel A and channel B, that are used to receive 

5 and monitor input audio signal 106 and output audio signal 
112, respectively. 

User interface 206 provides a graphical interface by 
which a user can control configuration module 204, analysis 
module 208, acquisition module 210 and file management module 

10 212. In addition, user interface 206 displays in real-time 
the data generated by the different analysis functions of 
analysis module 208. User interface 206 generates a variety 
of data plots and numerical displays for assessing various 
transmission qualities of audio data packets 120 such as data 

15 loss, latency and frequency response. 

Configuration module 204 allows the user to configure 
acquisition module 210 and analysis module 208 in real-time 
while capturing and displaying the acquired data. In 
addition, configuration module controls various display 

20 settings within user interface 206. Configuration module 204 
stores the settings such that the user can quickly configure 
audio analyzer 114 in response to different acquisition 
scenarios . 



-7 - 



Attorney Docket: 10559/199001/P8371 

Acquisition module 210 allows the user to start and stop 
data acquisition, initialize DSA 202 and configure DSA 202 for 
appropriate triggering such that the input audio signal 106 
and output audio signal 112 can be synchronized. Acquisition 

5 module 210 also monitors DSA 202 and handles any errors 
generated during acquisition. 

Data analysis module 208 analyzes in real-time the data 
acquired by DSA 202 and converts the acquired data to graphic 
and numeric representation for plotting by user interface 206. 

10 As explained in detail below, data analysis module 208 

supports analysis of a variety of transmission qualities 
including latency, data loss, frequency response, and volume 
verification . 

File management module 212 allows the user to save data 
15 for exporting or future viewing. For example, file management 
module 212 allows the user to save plotted data points in 
ASCII file format with tab delimiters. The user can open a 
previously saved plot file for a static re-plot of the data 
and can save text data, such as the latency and data loss, to 
20 perform off-line analysis. Also, the user can save and 
restore the configuration settings to support repeatable 
acquisition processes. 

Audio analyzer 114 can be implemented in a computer, or a 
dedicated analysis tool, comprising digital electronic 
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circuitry, computer hardware, firmware, software, or a 
combination of them. In addition, the testing process of the 
invention can be implemented in a machine-readable article 
storing instructions for execution by a programmable 

5 processor. Figure 3 is a flow chart illustrating one 

embodiment of a process, suitable for implementation in a 
computer program, in which audio analyzer 114 of test 
environment 100 analyzes and quantifies the audio transmission 
qualities of telephony applications hosted by transmit device 

10 104 and receive device 110. 

Initially, audio generator 102 processes a stored audio 
file and drives transmit device 104 within input audio signal 
106 (302). Transmit device 104 digitizes input audio signal 
106 and generates compressed audio data packets 120 (304). 

15 Transmit device 104 communicates audio data packets 120 over 

network 108 to receive device 110 as a stream of data packets. 
Receive device 110 converts the audio data packets of data 
stream 120 .to analog form and produces output audio signal 112 
(306) . 

20 Audio analyzer 114 captures input audio signal 106 and 

output audio signal 112 and generates a corresponding peak- 
based envelope waveform for each captured audio signal (308). 
Each envelope waveform has a resolution that is a function of 
an audio buffer size determined by the particular codecs used 
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by the devices under test. For example, in one embodiment, 
the resolution is set to 25 percent of the audio buffer size. 
For G.723, audio buffers are 30 ms in duration so the 
resulting envelope resolution is set for 7.5 ms . Other 
5 conventional codecs such as the G.711 and G.729 have a buffer 
duration of 120 ms and 10 ms, respectively. The resolution 
for these buffers would be 30 ms and 2.5 ms, respectively. 
The enveloping process improves the analysis process of 
identifying the true data loss by filtering out faulty data 
10 losses. 

After generating the envelopes, audio analyzer 114 
analyzes the envelope waveforms to determine audio 
transmission qualities, such as data loss and latency, of 
telephony applications and hardware hosted by transmit device 

15 104 and receive device 110 (310). For example, analysis 

module 208 of audio analyzer 114 calculates the audio latency 
between transmit device 104 and receive device 110 by 
measuring the latency between the triggering signals present 
within input audio signal 106 and output audio signal 112. An 

20 additional feature includes, frequency response analysis for 
transmit device 104 and receive device 110. And, in order to 
verify the volume of the transmission, audio analyzer 114 
calculates and displays a voltage magnitude for each envelope. 



-10 - 



Attorney Docket: 1055 9/199001/P837 1 

Audio analyzer 114 summarizes the envelopes by 
subtracting the output envelope data from the input envelope 
data to indicate lost data envelopes (312) . The summary 
envelope waveform is voltage scaled to filter out the 

5 undesirable envelopes due to misalignment and any phase 

differences between waveforms. In one implementation the 
resolution of the envelope is set to 25% of the buffer size of 
the codecs of transmit device 104 and receive device 110 such 
that four consecutive data points within the summary envelope 

10 waveform indicate a loss of an audio buffer and a data packet 
within data stream 120. 

In generating the summary envelope, audio analyzer 114 
calculates each data point, Sum env , of the summary envelope 
according to the following equations : 

(Ienv — (Oenv + Oenv * C)) . r , « _ ^ ~. j rt r\ 

1 5 Sumenv = if (Sumenv < .5 * C ) then Sumenv = 0 

(Ienv + (Oenv + Oenv *C)) 

Where I env is the corresponding data point within the envelope 
waveform for the input audio signal 106, O env is the 
corresponding data point within the envelope waveform for the 
output audio signal 112, and C is a compensation factor 
20 calculated from any difference in voltages between the audio 
signals. After calculating a data point for the envelope 
summary waveform according to the above equation, audio 
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analyzer plots the envelope summary waveform to indicate any 
lost data packets. 

Figure 4 is a flow chart illustrating a process 400 by 
which audio analyzer 114 generates peak-based envelope 

5 waveforms from the captured input audio signal 106 and output 
audio signal 112. First, audio analyzer 114 analyzes data 
captured by DSA 202 and determines whether any amplitude bias 
is present within input audio signal 106 before the audio 
signal is transmitted by transmit device 104. If so, audio 

10 analyzer 114 removes the amplitude bias from both input audio 
signal 106 and output audio signal 112 (402) . Next, audio 
analyzer 114 normalizes the data generated by DSA 202 in 
capturing the audio signals (404). More specifically, audio 
analyzer 114 converts the raw data for captured input and 

15 output audio signals 106 and 112 to positive values. 

After normalizing the signals, audio analyzer 114 aligns 
the captured audio signals to compensate for expected latency 
introduced during transmission (406) . As described above, 
each audio signal includes a trigger signal, such as a short, 

20 low frequency, high-energy burst to support the alignment. 

Audio analyzer 114 uses the trigger signal to synchronize the 
capture of the transmitted and received audio signals 106 and 
112 and to support the alignment process for loss analysis. 
Audio analyzer scans the captured data to identify the pulses 

-12 - 
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of the embedded trigger signals , thereby determining starting 
positions for generating the envelope waveforms. 

Next, audio analyzer 114 proceeds from the starting 
positions within the captured data and generates the 
5 enveloping waveforms as a function of the codec buffer length 
used by the codec's in transmit device 104 and receive device 
110. 

Figure 5 is a plot illustrating an example input audio 
signal 500 produced by audio generator 102 and provided to 

10 transmitting device 104 for communication to receive device 
110. Similarly, Figure 6 is a plot illustrating an example 
output audio signal 600 generated by receive device 110 from 
audio data packets 120. Figure 7 is a plot illustrating an 
example input envelope waveform 7 00 and an example output 

15 envelope waveform 702 generated by audio analyzer 114 from the 
input audio signal 500 and output audio signals 600. Figure 8 
illustrates a summary data loss envelope 800 generated from 
input envelope waveform 700 and output envelope waveform 702. 
The invention has been described in reference to a 

20 variety of embodiments. Other embodiments are within the 
scope of the following claims. 
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What is claimed is: 



1 1. A method comprising: 

2 generating audio packets representing an input audio 

3 signal; 

4 communicating the audio packets over a network; 

5 generating an output audio signal from the 

6 communicated audio packets; 

7 generating an input envelope waveform and an output 

8 envelope waveform from the input audio signal and the 

9 output audio signal, respectively; and 

10 comparing the envelope waveforms. 

1 2. The method of claim 1, wherein comparing the envelope 

2 waveforms includes subtracting the output envelope 

3 waveform from the input envelope waveform. 

1 3. The method of claim 1, wherein comparing the envelope 

2 waveforms includes determining a transmission quality 

3 including at least one of data loss and latency. 

1 4. The method of claim 1, wherein communicating the audio 

2 packets includes communicating the audio packets over the 

3 Internet. 
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1 5. The method of claim 1, wherein communicating the audio 

2 packets includes communicating the audio packets between 

3 telephony-enabled computers. 

1 6. The method of claim 1, wherein generating the audio 

2 packets includes generating the audio packets from an 

3 Internet telephone. 

1 7. The method of claim 1, wherein: 

2 generating the audio packets includes digitizing the 

3 input audio signal and compressing the digitized input 

4 audio signal using an input coder /decoder (codec) having 

5 a first buffer length, 

6 generating the output audio signal includes 

7 generating the output audio signal using an output 

8 coder/decoder (codec) having a second buffer length, and 

9 generating the envelope waveforms includes 

10 generating the envelope waveforms at a resolution that is 

11 a function of the first buffer length and the second 

12 buffer length. 

1 8. A method comprising: 



-15 - 



Attorney Docket: 10559/199001/P8371 



2 capturing an input audio signal and an output audio 

3 signal associated with a network based telephony 

4 communication; 

5 generating an input envelope waveform and an output 

6 envelope waveform from the input audio signal and the 

7 output audio signal, respectively; and 

8 subtracting the output envelope waveform from the 

9 input envelope wavef orirf to produce a summary envelope 
10 waveform. 

1 9. The method of claim 8, wherein generating the input and 

2 output envelope waveforms includes removing a bias. 

1 10. The method of claim 8, wherein generating the input and 

2 output envelope waveforms includes normalizing the 

3 captured input and output audio signals. 

1 11. The method of claim 8, wherein capturing the input and 

2 output audio signals includes triggering the capture 

3 using a trigger signal embedded within the input audio 

4 signal. 

1 12. The method of claim 8, wherein generating the input and 

2 output envelope waveforms includes aligning the captured 

3 input and output audio signals. 
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1 13 The method of claim 8, wherein the output audio signal 

2 comprises an analog signal generated from an audio data 

3 stream of digital packets communicated over a packet- 

4 based network, and further wherein the digital data 

5 stream is generated from the input audio signal. 

1 14. The method of claim 13, wherein generating the input and 

2 output envelope waveforms includes generating the 

3 envelope waveforms at a resolution that is a function of 

4 a buffer length of coder/decoders (codecs) used in 

5 generating the audio data stream and the output audio 

6 signal. 

1 15. An article comprising a computer-readable medium having 

2 computer-executable instructions stored thereon for 

3 causing a computer to: 

4 capture an input audio signal and an output audio 

5 signal associated with a network based telephony 

6 communication; 

7 generate an input envelope waveform and an output 

8 envelope waveform from the input audio signal and the 

9 output audio signal, respectively; and 

10 subtract the output envelope waveform from the input 

11 envelope waveform to produce a summary envelope waveform. 
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1 16. The article of claim 15, wherein the computer-executable 

2 instructions cause the computer to generate the input and 

3 output envelope waveforms by removing any amplitude bias 

4 in the captured signals, normalizing the captured 

5 signals, and aligning the captured signals using a 

6 trigger signal embedded within the input audio signal. 

1 17. The article of claim 15, wherein the output audio signal 

2 is an analog signal generated from an audio data stream 

3 of digital packets communicated over a packet-based 

4 network, and further wherein the digital data stream is 

5 generated from the input audio signal. 

1 18. The article of claim 17, wherein the computer-executable 

2 instructions cause the computer to generate the envelope 

3 waveforms at a resolution that is a function of a buffer 

4 length of coder /decoders (codecs) used in generating the 

5 audio data stream and the output audio signal. 

1 19. A system comprising: 

2 a transmit device to convert an input audio signal 

3 to data packets; 

4 a receive device communicatively coupled to the 

5 transmit device via a packet switched network, wherein 
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6 the receive device receives the data stream and converts 

7 the data stream to an output audio signal; and 

8 an audio analyzer coupled to the transmit device and 

9 the receive device, wherein the audio analyzer captures 

10 the input audio signal and the output audio signal, and 

11 further wherein the audio analyzer generates a data loss 

12 summary envelope from the input audio signal and the 

13 output audio signal. 

1 20. The system of claim 19, wherein the transmit device 

2 includes a coder/decoder (codec) to convert the input 

3 audio signal to digital data and the receive device 

4 includes a coder /decoder (codec) to convert the digital 

5 data stream to the output audio signal, and further 

6 wherein the summary envelope has a resolution that is as 

7 a function of a buffer length of the codec of the 

8 transmit device and a buffer length for the codec of the 

9 receive device. 

1 21. The system of claim 20, wherein the codecs have equal 

2 buffer lengths and the resolution of the envelope 

3 waveforms is approximately 25% of the codec buffer 

4 length. 
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1 22. The system of claim 20, wherein the codecs are G.723 

2 codecs and the transmit device communicates the data 

3 stream using the H.323 protocol, and further wherein the 

4 buffer length is approximately 30ms and the resolution of 

5 the envelope waveforms is approximately 7.5ms. 

1 23. The system of claim 19, wherein the network is a global 

2 computer network 

1 24. The system of claim 19, wherein the transmitting device 

2 or the receiving device comprises an telephony-enabled 

3 computer . 

1 25. The system of claim 19, wherein the transmitting device 

2 or the receiving device comprises an Internet telephone. 

1 26. The system of claim 19, wherein the audio analyzer 

2 further includes means for subtracting the output audio 

3 signal from the input audio signal to generate the 

4 summary data loss envelope. 

1 27. The system of claim 19, wherein the audio analyzer 

2 includes a graphical user interface that displays in 

3 real-time the summary envelope waveform and transmission 

4 qualities within the audio test system including latency. 
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The system of claim 19, wherein the audio analyzer 
includes a multi-channel dynamic signal analyzer for 
sampling the input audio signal and the output audio 
signal . 

The system of claim 19 and further including an audio 
generator to generate the input audio signal from a 
stored audio file. 

The system of claim 19, wherein the input audio signal 
includes a trigger signal having a low-frequency, high 
amplitude pulse. 



AUDIO TESTING IN A PACKET SWITCHED NETWORK 



ABSTRACT 

An audio test system for analyzing and quantifying audio 
5 data losses during network-based telephony sessions between 

communication devices such as telephony-enabled computers and 
Internet telephones. A transmit device converts an input 
audio signal to a stream of data packets and communicates the 
data stream over a network to a receive device. The receive 

10 device converts the data stream to an output audio signal. An 
audio analyzer is coupled to the transmit device and the 
receive device to monitor and capture the input audio signal 
and the output audio signal. The audio analyzer determines 
transmission qualities for the session, such as data loss and 

15 latency, by generating and comparing envelope waveforms of the 
input audio signal and the output audio signal. In order to 
increase the accuracy of the data loss analysis, the 
resolution of the envelope waveforms is set as a function of 
the communication protocol used to communicate the audio data 

20 stream and a buffer length of the coder/decoders used by the 
transmit device and the receive device. 
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